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Abstract. In a recent paper 'The equi-energy sampler with applications statistical inference 
and statistical mechanics' [Ann. Stat. 34 (2006) 1581-1619], Kou, Zhou & Wong have pre- 
sented a new stochastic simulation method called the equi-energy (EE) sampler. This technique 
is designed to simulate from a probability measure iz, perhaps only known up to a normalizing 
constant. The authors demonstrate that the sampler performs well in quite challenging prob- 
lems but their convergence results (Theorem 2) appear incomplete. This was pointed out, in 
the discussion of the paper, by Atchade & Liu (2006) who proposed an alternative convergence 
proof. However, this alternative proof, whilst theoretically correct, does not correspond to the 
algorithm that is implemented. In this note we provide a new proof of convergence of the 
equi-energy sampler based on the Poisson equation and on the theory developed in Andrieu 
et al. (2007) for Non-Linear Markov chain Monte Carlo (MCMC). The objective of this note 
is to provide a proof of correctness of the EE sampler when there is only one feeding chain; 
the general case requires a much more technical approach than is suitable for a short note. In 
addition, we also seek to highlight the difficulties associated with the analysis of this type of 
algorithm and present the main techniques that may be adopted to prove the convergence of 
it. 



1. Introduction 

In this note we consider the convergence properties of a new stochastic simulation technique, 
the equi-energy sampler introduced in (Kou, et al. 2006). This is a method designed to draw 
samples from a probability measure it £ &(E) (where 2P{E) denotes the class of probability 
measures) on measurable space (E, §~), where E may be a high dimensional space and the density, 
is known pointwise up to a potentially unknown constant. In particular, the algorithm generates 
a non-Markovian stochastic process {X n } n >Q whose stationary distribution is ultimately 7r; this 
algorithm is described fully in Section 2. 
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In the paper of Kou et al. (2006), an attempt to analyze the algorithm is made (in Theorem 2). 
However, it was noticed in the discussion by Atchade & Liu (2006) that this result is incomplete. 
We note the points that were stated by Atchade & Liu and further expand upon their point; see 
Section 3. An important remark is that Atchade & Liu attempt to provide an alternative con- 
vergence result, via a Strong Law of Large Numbers (SLLN) for bounded measurable functions. 
Although this proof is correct, the authors study a stochastic process which does not correspond 
to the algorithm; this problem is outlined in Section 3. 

The objective of this note is to provide some convergence proofs for the EE sampler in a 
simple scenario (one feeding chain). We also note the difficulties associated with the analysis 
of this type of algorithm and present the main methods that can be used to prove the SLLN. 
To avoid unnecessary technicalities and focus on the 'essence' of the proof, strong assumptions 
are made: including the uniform ergodicity of some transition kernels. Our proof strategy is via 
the Poisson equation (e.g. Glynn & Meyn (1996)) and the techniques developed for Non-Linear 
MCMC (Andrieu et al. 2007). That is, the EE sampler is a non- linear MCMC algorithm and 
may be analyzed in a similar manner. Our results can be found in Section 4. 

2. Notation and Algorithm 

We now outline the notation that is adopted throughout the paper as well as the algorithm 
that is analyzed. 

2.1. Notation. Define a measurable space (E,&), with ir S &(E) (recall £P(E) denotes the 
class of probability measures on (E, $ )) a target probability measure of interest. 

For a stochastic process {X n } n >Q on (E™, <f® N ), Sf n = <r(Xo, . . . , X n ) is the natural filtration, 
is taken as a probability law of a stochastic process with initial distribution /i and E^ the 
associated expectation. If [i = S x (with S the Dirac measure) V x (resp. E^) is adopted instead of 
Fs x (resp. E^ x ). We use X n -^-p X to denote almost sure convergence of X n to X. The equi- 
energy sampler generates a stochastic process on (f2,JP"), which is defined in the next Section. 

Let ||?y— /i|| tv := sup^g^ \rj(A)—fj,(A)\ denote the total variation distance between 77, \i € 3^[E). 
Throughout, K : E — > £?{E) is taken as a generic Markov kernel; the standard notations, for mea- 
surable f : E —> R, K(j){x) := j E f(y)K(x,dy) and for fi G 9{E) ftK(f) := f E K(f)(x)fx(dx) 
are used. Let / : E x E — > R, then for fi £ £?(E), fj,(f)(x) := J E f(x,y)fj,(dy), with an obvious 
extension to higher dimensional spaces. Bb(E) is used to represent the bounded measurable 
functions and for / £ Bb(E), \\f\\oo '■= sup xeE \f{x)\ is used to denote the supremum norm. 
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We will denote by : &(E) x E — > 3P{E) a generic non-linear Markov kernel and its 
invariant measure (given its existence) as uj(fi) {u> : £P(E) — > &(E)). For a sequence of 
probability measures {/J n }n>o we denote the composition J En -i K^{x, dy\) . . . K^ ln (y n -i, A) as 
K^;^ (x, A). The empirical measure of an arbitrary stochastic process {X n } n >o is defined, at 
time n, as: 

1 ™ 

S n (du) := — — y^5 Xi (du). 
n + 1 *— ' 

i=0 

In addition, a V 6 := max{a, 6} (resp. aAt:= min{a, 6}). The indicator function of A 6 £ is 
written l A (x). Note also that N = MU {0}, T m := {1, . . . , m}. 

2.2. Algorithm. We introduce a sequence of probability measures, for r > 2, {7r„}„ e T r! fin G 
ZP(E), n <E T r and 7r r = ir which are assumed to be absolutley continuous, wrt some reference 
measure A*, and, in an abuse of notation, write the Radon-Nikodym derivatives as dir n /d\* (x) = 
ir n (x) also. The EE sampler will generate a stochastic process {Y^} n >o, with Y£ — (X^, . . . , X^), 
with X l n : E — > K fc , t G T r , fc > 1 (that is {Y^} n >o is a stochastic process on (fi,J^") = 
((i7) N , ( ( f®' , )® N )). Central to the construction of the EE sampler is the concept of the energy 
rings; this will correspond to the partition E = (J i=1 Ei- 

For each X % n we associate a non-linear Markov kernel {K^^ n } n ^j r with -K^i = K\ (i.e. ifi is 
an ordinary Markov kernel) and /i G S?{E). Additionally, assume that for i = 2, . . . ,r — 1: 

(2.1) cJ i (7r i _i)-R',r i _ 1 ,i(di/) = Wj(7r i _i)(dj/) = ir^dy) 

and that tt±Ki = tt\. Here, it is assumed that, given that we input the invariant probability mea- 
sure for Kn i _ 2 ,i-i into the non-linear kernel -K^.i, the target probability measure iTi is obtained. 
Define: 

(2.2) K M ,i(x,dy) := (1 - e)K { (x, dy) + eQ^ ti (x, dy) 

i = 2, . . . , r, Ed [0,1], with a Markov kernel of invariant distribution iti and also: 

Q^ x ,i(x,dy) := / fi x {dz)Kf(Ki(dy))(x,z) 
Je 



' E 

il 



:= £W*) 



^(J5j n A) 



i=l 

where it is assumed /i(^) > 0; let ^> d (£) = G ^»(JS) : /i^) > Vt G T d }. Finally define: 
Kf({x,y),d{a/,j/)) := 6 x (dy')5 y (dx')ai(x,y) + 6 x (dx')S y (dy')[l - an(x,y)] 



at(x,y) = I A 



•Ki(x)-Ki-l{y) 
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which is the swapping kernel. It is easily seen that the kernels (|2.2p satisfy the equation (|2.1[) . 
However, it is often the case that such a system cannot be simulated exactly. The idea is to 
approximate the correct probability measures n n via the empirical measures generated by the 
previous chain. 

The algorithm which corresponds to the equi-energy sampler is as follows. Define predeter- 
mined integers N\ , . . . , N r and assume that for all i 6 T r , j = (recall d corresponds to the 
number of energy levels) we have S l Ni .(Ej) > with S l the empirical measure of the i th process 
and Ni-.i = Y?j=i ^Yr The algorithm is in Figured] 

0. : Set n = and X 1:r = x% r , S l = S x i , l = l,...,r. Set i = 1. 

1. : Perform the following for i = 1 until i = r. Set j = 1. 

2. : Perform the following for j = 1 until j = iV,-, then set i = i + 1 and go to 1. 

3. : Set n = n + 1, k = 1. 

4. : Perform the following for k = 1 until = i, then set fc = i + 1 and go to 5. 

X n ~ Ksi-Sfcfrn-i. ■)> ^« = S'n-i + JJTtP** ~ set A: = fc + 1 and go to 4. 

5. : Perform the following for k = i + 1 until k > r, then set j = j : + 1 and go to 2. 

6. : ~ (S^fe i (•) then set k = k + 1 and go to 5. 

Figure 1: An equi-energy sampler. 

Remark 1 . We point out here that our algorithm is slightly different from that of Kou et al. 
There, the EE jump can be seen as using a Metropolis-Hastings (M-H) independence sampler 
with proposal 7r,_i constrained to the set Ei currently occupied by the current state (the kernel is 
then approximated). We have preferred to do this in a selection/mutation type format (see Del 
Moral ( 2004 ) ) where a value is selected from the empirical measure of the lower chain and then 
put through a M-H exchange step. We then allow a possibility of mutation (sampling from Ki). 
This has been done in order to fit our proof in the framework of Andrieu et al. (2007), which 
allows us, below, to refer to minor technical results from that work and hence reduce the length 
of this note. It should be noted that, from a technical point of view, changing the algorithm back 
to the EE sampler presents no difficulties, in terms of the following arguments. Indeed, the only 
real changes to the proofs are some of the technical assumptions in Andrieu et al. (2007) and the 
uniform in time drift condition presented there (Proposition 4-1). 

Remark 2. In our view, the non-linear kernel interpretation of the equi-energy sampler allows 
us to intuitively understand some practical issues associated to the algorithm, whilst perhaps not 
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requiring a full technical understanding. For example, if there is only one feeding chain, and it is 
stopped at some point, then we can observe from equation V2.1\) that this algorithm is then biased 
(contrary to the point of Kou et al. (2006) pp-16^7, 5th par, although we realize that it is not 
possible to store an infinite number of samples). 

3. Discussion of the Previous Proofs 

The difficulties of the convergence proofs of Kou et al (2006) and Atchade & Liu (2006) are 
now discussed. 

3.1. Theorem 2 of Kou et al. (2006). Wc begin with the proof of Theorem 2 of Kou et al. 
Recall that the Theorem states, under some assumptions, that the steady state distribution of 
{X^}„>o is Tti. The authors use induction and start by using the ergodicity of the M-H chain 
which verifies the case r = 1 and continue from there. 

Atchade & Liu state that equation (5) of the proof is not clear, however, we note that the 
equation can indeed be verified (and as stated by Kou et al. (2006) in the rejoinder to the 
discussion (pp-1649)) by using the SLLN (via the induction hypothesis) and bounded convergence 
theorem. 

The main difficulty of the proof is as follows, quoting Kou et al (2006), pp-1590: 

Therefore, under the induction assumption, is asymptotically equivalent to 
a Markovian sequence governed by S^'(x, ■). 

Here the kernel S^(x,-) is the theoretical kernel corresponding to K Vi _ lt i. The authors then 
state that S^'(x, •) is an ergodic Markov kernel which then yields the convergence ofXW. This 
is the difficulty of the proof: the authors verify that the transitions of the stochastic process are 
asymptotically equivalent to that of an ergodic Markov kernel, however, this is not enough to 
provide the required convergence of the process. That is, Kou et al. (2006) prove that (suppressing 
the notation iVi : »_i) 

lim \K S i-i Ax, A) ~ K Vi _ i(x,A)\ ^> p( ,-i) 

where p( l_1 ) is the probability law of the process with i — 1 chains. However, this convergence 
property essentially means that when the input probability measure is converging to the 
'correct' probability measure 7Ti_i then a set-wise convergence of the non-linear kernel K. j is 
induced. This is far from sufficient as the law of the process at iteration n is, for A G £ 
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where <SJ , , ■ ■ ■ , S^ 1 are empirical distributions constructed from the same realisation of 
the process at level i — It is clear that if the algorithm is to converge, then the joint distributions 
of Jf„A T , ■ ■ • , Xn for any (in fact increasing with n) lag r should converge to 

which as we shall see is far from trivial. This remark indicates an appropriate approach to a 
proof; via standard Markov chain convergence theorems. As a result, using the arguments of 
Kou et al. (2006), we cannot even say that 

lim \K Sl -.s n+N . . ,,ii x i A ) _7r i(^)l -^p(i-D 

n — >oc 

via the ergodicity of K lli _ 1 ,i{x, A); i.e. a set-wise convergence of the kernel that is simulated. 

3.2. Theorem 3.1 of Atchade & Liu (2006). Atchade & Liu state (pp-1625, in the proof of 
Theorem 3.1): 

Note that the i th chain is actually a non- homogeneous Markov chain with tran- 
sition kernels K { q 1 \k[ 1 \ where K%\x, A) = P(X ( n l) +1 E A\X% ] = x). 

This statement is not quite accurate. The i th chain is a non-homogeneous Markov chain only 
conditional upon a realization of the previous chain; unconditionally, it is not a Markov chain. 
As a result, Atchade & Liu analyze the process of kernel: 



K^{x,dy) = {l-e)K i (x,dy)+eE 



where Flff is defined in Atchade & Liu. This is not the kernel corresponding to the algorithm; 
the algorithm simulates: 

Q si - U {x,dy) = f StT^KfiKiidyVfay) 
Je 

that is, we do not integrate over the process {A^ -1 }, we condition upon it. Therefore, the proofs 
of Atchade & Liu do not provide a theoretical validation of the equi-energy sampler. 

4. Ergodicity Results 

The SLLN is now presented: we have only proved the case when r — 2 and this is assumed 
hereafter. There are some difficulties in extending our proof to the case r > 3; this will be 
outlined after the proofs. Note that our proof is non-trivial and relies on a SLLN for U— statistics 
of stationary ergodic stochastic processes (Aaronson et al. 1996). 
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4.1. Assumptions. We make the following assumptions (it is assumed that for any i £ T r , 
j £ T d , ni(Ej) > throughout). 

(Al) • (Stability of Algorithm): There is a universal constant 8 > 0, such that for any n > 0, 
j £ Td, i £ T r _i we have, recalling that Ni :i = Y?j=i Nj'- 

(A2) • (Uniform Ergodicity): The {_ftT„}„ e T r are uniformly ergodic Markov kernels with a 
one step minorization condition. That is: V?i £ T r , 3(<f> ni v n ) £ R + x 3P(E) such that 
V(>,A) £ £ x S: 

K n (x,A) > 4> n v n (A). 
(A3) • (State-Space Constraint): E is polish (separable complete mctrisablc topological space). 

4.2. Discussion of Assumptions. The assumptions we make are quite strong. The first as- 
sumption (A[T]) is used to allow us to bound: 

1 (m + 2) 

which will appear in the proof below. This assumption, on the empirical measure, is removed in 
Andricu et al. (2007); however, this is at the cost of a significant increase in the technicalities of 
the proof. As a result, (.Ad]) is adopted as an intuitive assumption as it states: 

(1) Make sure that iri(Ej) for all i, j is non-negligable. 

(2) Let N\, . . . , N r -i be reasonably large so that we can expect convergence. 

The second assumption (A[5J) might appear strong, but allows us to significantly simplify both 
notation and our proofs whilst preserving the 'essence' of the general proof. In addition, this 
condition will often be satisfied on finite state spaces. More general assumptions could be used, 
at the expense of significant notational and technical complexity. The assumption allows us to 
use the following facts: 

(1) For any fixed fi £ S B d (E) 1 3u>i(fi) £ 3^(E) such that uji(^)K^i = uji(fi). 

(2) For any fixed \x £ 3 g d(E), i £ T r , 3p £ (0, 1), M < oo such that for any n £ N we have 
su Pxe£ \\K^(x, •) - WiMHtv < Mp n . 

These properties will help to simplify our proofs below. 

The final assumption (A[3|) will be related to some technical arguments in the proof. 

4.3. SLLN. We are to establish the convergence of S 7 n (f ) 1 . r 7i>(/) for some / to be defined 
in the proof and n > Ni :r ^i. 
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4.3.1. Strategy of the Proof. Our approach is to consider S%. r = l/(n— -ZV 1: r _i+l) Y^j=Ni-r-i u r(iSj ) 
and adopt the decomposition: 

(4-3) SXf)-Mf) = W)-<S£r(/) + S£r(/)-Tr(/). 

The analysis of the first term on the RHS of (|4.3j) relies upon a Martingale argument using the 
classical Poisson's equation solution: 

f(x r n ) c(^;r 1 )(/) = f r s ^(K) - K S r-^ r (r s ^)(x-) 

where f^r-i is a solution of the Poisson equation. Indeed, the first term on the RHS of (|4. 3[) can 
be rewritten: 

n 

(4.4) (n-JVi ir -i + l)[,S*-S£ r ](/) = M; +1 + ^ [/^(Jf^+i)- 

m=JVi :r _! ™ 
/cr-i(^m+l)] + /or-i (Xq) - (-X£ +1 ) 

where 

n 

K+l = E ^(^+0-^- I ,r(/s-0(^)] 
m=JV 1:r _i 

(4-5) = E^-.r^^+l)-^^™ ')(/)] 

nGN 

and {M^, ^ n }n>o is a martingale and M£ := 0, for < n < JVi :r _i. Recall that (|4. 5|) is a solution 
to the Poisson equation, which will exist under our assumptions above. 

The proof will deal with the Martingale via the Burkholder inequality and the fluctuations of 
the solution of the Poisson equation due to the evolution of the empirical measure (|4.5[) using 
continuity properties of the kernel Q^. The bias term S% r (f) — 7r r (/) is controlled by a SLLN 
for U— statistics of stationary crgodic stochastic processes. 

4.3.2. Main Result. 

Theorem 4.1. Assume (^T£Qj). Then for any p > 1, 3B p < oo such that for any n > N\- r -i 
and f S Bb(E) we have that: 

E~l:r 


if, in addition, (J^j) holds then for any f € Bb(E): 



K-^ i(/)p> 



< 



fn- JV 1:r _i + 



1)* 
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Proof. Our proof relies heavily upon the theory of Andricu et al. (2007). Note that, under (A2) 
and, for any fixed fi £ ^d(E), the uniform ergodicity of the kernel K^^ allows us to use the 
methods of Andrieu et al. (2007). We will follow the proof of Theorem 6.5 of that paper. In 
order to prove the SLLN in the paper, the authors combine a series of technical results. The 
first of which is the Lipschitz continuity of the kernel Q^; we establish the result for bounded 
functions and the particular kernel considered here. To simplify the notation, we remove the 
sub/superscripts from the various objects below. 
Let / 6 Bb(E) and £ £Pd(E), then we have: 



(/)(*)-<&„ (/)(*)! 



sup \\K s (K(f))(x,y)-Q^(f)(x)\\°° 

(z,y)6£ 2 

'exe sup (Xiy)6 _E2 \\K s (K(f)){x,y) - Q^(f)(x)\\ c 
H x {dy) x 5 x (dx') - S, x (dy) x 8 x (dx') 



< 211/Hoo SUp \\fj, x - £ x \\ tv 



We then note that Propositions 6.1 and 6.2 (bounding the solution of the Poisson equation 
and Martingale in the L p norm) of Andrieu et al. (2007) are proved in the same manner. That 
is, in a similar way to the proofs constructed there, we can show that: 



\f Sm (X m+ i)\ p 

E x l:r[\M n f 



1/p 



1/p 



< Ml 



< Ml 



,1/2 



As a result, the verification of Proposition 6.3 (bounding the fluctuations of the Poisson equation 
due to the evolution of the empirical measure) and Theorem 6.5 (the SLLN) are required. 
We begin with the equation (|4.5|) : the bound is proved by establishing: 



(4.6) 



\S m +i. x (f) — S m , x (f)\ < 



Ml 



m + 2 



for M < oo some constant and any / £ Bb(E). Consider 

'S m+1 (I E J) S m (I E J) 



\Sm+i,x(f) — S„ l:X (f)\ — 



(=1 
(I 



S m +l(Ei) S m (Ei 
f(x m+ l)lEi{x m +l) _ 

(m + 2)S m+1 {E t ) m + 
m + 2 



1 TO 



S m +i(Ei) (m + l)S m (Ei) 
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Now, since: 



(m + 2) 



S m+l {Ei) (m + l)S m (Ei) 



|(m + l)ff Tra (£,) - 6 Xm+1 (Ej) - (m + l)ff m (£»)l 
(m + l)S m (£J s )5 m+ i(S i ) 



< 



(m+l)S m (£ ( )S m+ iOE j ) 
1 



(m + 1)6> 2 



it follows that: 



\S m +l,x{f) — S m . x (f)\ < 



< 



Mll/lloc 



1 1 



m + 2 
as required. 

To bound the fluctuations of the Poisson equation, the decomposition (Proposition B.5) in 
Andrieu ct al. (2007) is adopted, along with Minkowski's inequality: 



E, 



\ fs m+1 (X m+ i) - f Sm (X m+1 )\ p 



< 



E 

neN 



i=0 



Y,l K L +1 -u(S m+ i)](K Sm+1 -KsJlK^- 1 -o;(5 m )](/)(X m+1 )P 



[w(5 m+ i) - w(5 m )](Ji^ m - u(S m ))\ p 
To bound the first expression on the RHS, we can use the fact that, for a fixed (deterministic) 
pair of empirical measures S m , S m +i <E &d{E) and for any x £ E: 

|[^ m+1 - W (S m+1 )](tf Sm+1 -if Sm )[^- i - 1 - W (5 m )](/)(a ; )| < 

jivn(if Sm+1 -x 5m )[^: i_1 -^(5 m )](/)iioo 

and further, for any ie£: 



|(*s m+1 - ifsJ^L?- 1 - ^(5 m )](/)(x)| < 



mii^'" 1 -^^)]^)!! 

m + 2 



due to the Lipschitz continuity of Q and the bound (|4.6|) : therefore: 

l|[^L +1 -w(Sm+l)](X- Sm+1 -^S m )[^r- 1 -W(5 m )](/)|| C 



< 



Mp 



n-l 



m + 2 



Since, due to (jA[T]), this property holds almost surely, it is possible to bound the first expression. 
The second expression is dealt with in a similar manner, using the inequality (see Andrieu et 
al. (2007)): 



||[o;(S TO + 1 )-w(5m)](/)||oo < M\\[K Sm+1 - K Sm ](f)l 
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This result can be obtained by the continuity of invariant measures of uniformly ergodic Markov 
kernels indexed by a parameter. 

To complete the first part of the proof, we can use the manipulations of Del Moral & Miclo 
(2004), Proposition 3.3, to yield: 



E„i„ 

x 



IK-S£ r ](jT /P 



(n-JVi :r _i + l)3 

To control the bias S% r (f) — 7iy(/) when r — 2, the following decomposition is adopted: 

\MS m )-wfaW)\ < \[K q Sm - A^J(/)| + \MS m ) - A'f J(/)| + \[K% w(7Ti)](/)|. 

Due to the uniform ergodicity bound \\K^ — w(/i)||t v < Mp we will show that for any q e N: 

(4.7) hm \m m -K^](f)\ = P x i,r -a.s. 

Let e = 1; the general case is dealt with below. Let fi £ 3^d{E) 1 and for simplicity write 
K s (K(f) x l)(a;,y) := P(f)(x,y), f e Bb(E), then we will prove by induction that: 
(4.8) 

q„.qm. . . ■ Q M . (/)(«) = e ^ I ( n h ) p (^ 2 p fe 3 • • • p (/)))) I w 

' p- ' (ii,-,i«)6TS llj=lWjJ L ,=i ' . 'J 

9 times «' a g _l terms 

where a composition of the P kernels is defined as: 



P 9 (f)(x,x 1:q ) := / P((x,x 1 ),dyi)P((yi,x 2 ),dy 2 ) ■ ■ ■ P((y q -i, x q ), dy q )f{y q ) 



£ J-x i( ,t ^ 8(g - x) {(n^) 

,_„-i TTLi K E u) l f-i 



For q = 1 (|4.8[) clearly holds, so assume for g — 1 and consider q: 

P^-'-P^^P^. (/)))) 

To continue the proof, consider: 

P(Q„. = / P{{x,x 1 ) : dy 1 ) f fi yi (dx 2 )P(f)(y u x 2 ) 

J E J E 

Jl El (x 2 )P(f)(y ly x 2 )^dx 2 ) 



f Pi&xt^dyjf^lEtiyi) 

JE „■_, 



' y.{I Ei P{I Ei P{f))) 
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Thus, due to the above equation: 



Q tlx Q 



.(/)(*) 



E 



^((ni^) 



(ii,...,i q -i)eT 

p(I Ei ---p 



3 = 1 



* lx{I Eia Pl Eia P{f))) 



9-1 



u,...,i 5 ) 6 T« Hj=l^J L >=! 



fe vl f(fc,/fe,/(/))) 



Application of Fubini's theorem yields the desired result. 
To prove, for e = 1, that (|4.7[) holds, observe that: 
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Application of Theorem U and Proposition 2.8 of Aaronson et al. (1996) (along with the 
Theorem for almost sure convergence of continuous transformations of almost surely convergent 
random variables) yields the desired result. Firstly, note that these are results associated to the 
almost sure convergence of U— and V— (Von Mises) statistics; this is where (A[3]) is required. 
Secondly, we remark that it is not required that the auxiliary process is started in its stationary 
regime (as stated in the result of Aaronson et al. (1996)): We can adopt a coupling argument 
for uniformly ergodic Markov chains, along the lines of Andrieu et al. (2007) (Theorem 6.5 and 
Proposition C.l). To complete the proof for e £ (0, 1), we note the following decomposition for 
iterates of mixtures of Markov kernels K and P: 



((1 - e)K + eP)"(x, dy) = £ e'(l - e) n ~ l £ 

(ai,...,a„)eS| 



1— on noi 



K L - a ~P a "{x,dy). 



1=0 



where Si = {(ax, • ■ ■ , ct n ) : J^?=i a j = 0? there is no difficulty to extend the result, using the 



bounded convergence theorem where required. 



□ 



Remark 3. In the proof we have adopted a decomposition that has naturally led to the use 
of SLLN for U— statistics. Essentially, the algorithm requires that the invariant measures con- 
verge to the desired distribution, and this is manifested, in our proof, via the iterates of the 
non-linear kernel. This is the main difficulty in proving the SLLN for the equi-energy sampler. 
An alternative approach, via uniform SLLN, may also be adopted, possibly at the cost of more 
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abstract assumptions; see Del Moral (2004) f or example, in the case of particle approximations 
of Feynman-Kac formulae. 

Remark 4- We note that it is possible to extend our proof, via a density argument (see Del Moral 
(1998)), for a related algorithm, (NL3) of Andrieu et al. (2007), with r—1 feeding chains, but that 
this cannot be used for the equi-energy sampler, due to the fact that the indicator functions in the 
definition of the kernel \2. 2\l are not continuous. In general, a proof by induction requires more 
complicated arguments and as a result, we feel that the convergence of the equi-energy sampler, as 
well as the convergence rate (as brought up in the discussion of Kou et al. (2006)) are non-trivial 
research problems. 
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